Working with custom transcriptomes using factR2

Introduction

Many eukaryotic genes undergo alternative splicing to produce multiple RNA isoforms, thus expanding the coding and regulatory potential of the genome. The transcriptome of specific biological samples are constructed from high throughput sequencing by mapping short or long reads to a reference genome and assembling alignments into transcript architectures. The resulting Gene Transfer Format (GTF) files describe newly identified transcripts as sets of exonic coordinates, but they do not contain information about the coding sequences (CDSs; also known as ORFs) or possible biological functions of these transcripts. We previously published {factR}, an R package for the construction of CDSs for novel RNA isoforms. {factR} also predicts the domain organization of the protein products of these isoforms, and it identifies transcripts that are susceptible to nonsense-mediated decay (NMD; a pathway destabilizing mRNAs with premature translation termination codons).

This new version of {factR}, called {factR2}, extends the functionality of its predecessor by adding tools to probe for the regulatory potential of alternative exons and enhances its user-friendliness by adopting an R object class. {factR2} can pinpoint splicing events that trigger NMD, and it can also test the regulation of alternative splicing by correlating exon inclusion with gene expression levels. Additional features of {factR2} include direct quantification of exonic sequence conservation scores and an interactive visualization of transcript and domain architectures.

Getting started

Installing {factR2}

{factR2} is currently available on GitHub and can be installed using {devtools}:

# install.packages("devtools")
devtools::install_github("f-hamidlab/factR2")

Materials needed

{factR2} only requires a custom transcriptome file in GTF format as input. This file is typically generated by short-read assemblers such as StringTie2 and Cufflinks, or by long-read aligners such as minimap2 and Isoquant. We provide a sample custom transcriptome file assembled from PacBio HiFi data using Isoquant that can be used to test {factR2} functionalities (see “Section” for more details).

Several {factR2} functions require reference genomic sequences and transcript annotations. To make it easy for users to obtain these files, we provide a convenient retrieval system for reference files of common model organisms from the GENCODE and Ensembl databases. Users may optionally use their own reference files as inputs.

Using factR2

The following packages are required to follow along this workflow:

library(factR2)
library(tidyverse)
library(rtracklayer)

Creating a factRObject

Input custom transcriptome data are stored and analysed as part of an S4 class structure called factRObject. To create a factRObject, users can provide the local path to their custom GTF file, or a {GenomicRanges} object containing transcript architecture information of custom transcriptome. {factR2} comes with a sample GTF file assembled from long-read sequences of transcripts from postnatal mouse brain cortex (reference). The local path to this GTF file can be retrieved as such:

gtf.file <-  system.file("extdata/pb_custom.gtf.gz", package = "factR2")

In addition to a custom GTF file, users are required to define the genome used for read assembly. This is done by providing input to the reference argument and accepted inputs include the name of the organism (“Homo sapiens” or “Human”) or the specific ID of {factR2} supported genomes. This list can be retrieved by typing the function listSupportedGenomes() in console as such:

listSupportedGenomes()
#>      ID      species database release.date
#> 1   v44 Homo sapiens  GENCODE       7.2023
#> 2   v43 Homo sapiens  GENCODE       2.2023
#> 3   v42 Homo sapiens  GENCODE      10.2022
#> 4   v41 Homo sapiens  GENCODE       7.2022
#> 5   v40 Homo sapiens  GENCODE       4.2022
#> 6   v39 Homo sapiens  GENCODE      12.2021
#> 7   v38 Homo sapiens  GENCODE       5.2021
#> 8   v37 Homo sapiens  GENCODE       2.2021
#> 9   v36 Homo sapiens  GENCODE      10.2020
#> 10  v35 Homo sapiens  GENCODE       8.2020
#> 11  v34 Homo sapiens  GENCODE       4.2020
#> 12  v33 Homo sapiens  GENCODE       1.2020
#> 13  v32 Homo sapiens  GENCODE       9.2019
#> 14  v31 Homo sapiens  GENCODE       6.2019
#> 15  v30 Homo sapiens  GENCODE       4.2019
#> 16  v29 Homo sapiens  GENCODE      10.2018
#> 17  v28 Homo sapiens  GENCODE       4.2018
#> 18  v27 Homo sapiens  GENCODE       8.2017
#> 19  v26 Homo sapiens  GENCODE       3.2017
#> 20  v25 Homo sapiens  GENCODE       7.2016
#> 21  v24 Homo sapiens  GENCODE      12.2015
#> 22  v23 Homo sapiens  GENCODE       7.2015
#> 23  v22 Homo sapiens  GENCODE       3.2015
#> 24  v21 Homo sapiens  GENCODE      10.2014
#> 25  v20 Homo sapiens  GENCODE       8.2014
#> 26  v19 Homo sapiens  GENCODE      12.2013
#> 27 vM33 Mus musculus  GENCODE       7.2023
#> 28 vM32 Mus musculus  GENCODE       2.2023
#> 29 vM31 Mus musculus  GENCODE      10.2022
#> 30 vM30 Mus musculus  GENCODE       7.2022
#> 31 vM29 Mus musculus  GENCODE       4.2022
#> 32 vM28 Mus musculus  GENCODE      12.2021
#> 33 vM27 Mus musculus  GENCODE       5.2021
#> 34 vM26 Mus musculus  GENCODE       2.2021
#> 35 vM25 Mus musculus  GENCODE       4.2020

With these inputs, a factRObject can be created as follows:

fobj <- createfactRObject(gtf.file, reference = "vM33")
#> 🡆  Checking inputs
#> 🡆  Checking factRObject
#> 🡆  Adding custom transcriptome
#>     ℹ Importing from local directory
#> 🡆  Adding annotation
#>     ℹ Importing from URL
#> 🡆  Adding genome sequence
#>     ℹ Using BSgenome object
#> 🡆  Matching chromosome names
#> 🡆  Matching gene information
#>     Number of mismatched gene_ids found: 83520
#>     --> Attempting to match ensembl gene_ids...
#>     --> No ensembl gene ids found in query
#>     ---> Attempting to match gene_ids by finding overlapping coordinates...
#>     ---> 65203 gene_id matched
#>     Total gene_ids corrected: 65203
#>     Remaining number of mismatched gene_ids: 18317
#> 🡆  Creating factRset objects
#> 
#>     ℹ Adding gene information
#> 
#>     ℹ Adding transcript information
#> 
#>     ℹ Adding alternative splicing information
#> 
#> 🡆  Annotating novel transcripts
#> 
#> 🡆  Annotating novel AS events
#> 
#> ℹ factRobject created!

The function above extracts and stores gene-, transcript- and exon-level information from a custom GTF file and, by default, checks for inconsistencies in gene annotation to the reference transcriptome.

Interacting with a factRObject

{factR2} comes with many functions to access the information stored within a factRObject. A preview of the object’s contents can be printed by typing the variable name:

fobj
#> class: factRObject [version 0.99.0]
#> # transcriptome:
#>    36931 genes; 
#>    115055 transcripts [99855 novel]; 
#>    0 coding transcripts 
#> # active set: AS
#> # samples (0):

Transcripts that contain distinct sets of internal exons are defined as “novel”. Note that input custom transcriptome lack coding sequence information and thus, none of the transcripts are currently annotated as “coding”.

Metadata of genes, transcripts and exons are segregated into individual “set” data in a factRObject. At any one time, only one of this “set” will be assigned as the active “set”. Users may switch to a different “set” by assigning the name of the new “set” to the activeSet function:

activeSet(fobj) <- "transcript"

The exact names of the “set” can be printed out using listSets():

listSets(fobj)
#> [1] "gene"       "transcript" "AS"

The metadata associated with the current active “set” can be displayed as a dataframe using the features() function:

features(fobj)
#> # A tibble: 115,055 × 8
#>    transcript_id              gene_id   gene_name strand width novel cds   nmd  
#>    <chr>                      <chr>     <chr>     <fct>  <int> <chr> <chr> <chr>
#>  1 transcript51190.chr3.nnic  ENSMUSG0… Gnai3     -       1333 yes   no    no   
#>  2 transcript62347.chr16.nnic ENSMUSG0… Cdc45     -       2279 yes   no    no   
#>  3 transcript62383.chr16.nnic ENSMUSG0… Cdc45     -        679 yes   no    no   
#>  4 transcript62392.chr16.nnic ENSMUSG0… Cdc45     -       2282 yes   no    no   
#>  5 transcript45445.chrX.nnic  ENSMUSG0… Scml2     +       1237 yes   no    no   
#>  6 transcript45454.chrX.nnic  ENSMUSG0… Scml2     +        677 yes   no    no   
#>  7 transcript45461.chrX.nnic  ENSMUSG0… Scml2     +        451 yes   no    no   
#>  8 transcript45463.chrX.nnic  ENSMUSG0… Scml2     +        669 yes   no    no   
#>  9 transcript45465.chrX.nnic  ENSMUSG0… Scml2     +        464 yes   no    no   
#> 10 transcript45467.chrX.nnic  ENSMUSG0… Scml2     +        425 yes   no    no   
#> # ℹ 115,045 more rows

To display the transcripts from your desired genes, users can provide IDs or names of genes as input to the second argument of features(). For example, to query for the transcripts from U2af1 gene, we can type:

features(fobj, "U2af1")
#> # A tibble: 5 × 8
#>   transcript_id              gene_id    gene_name strand width novel cds   nmd  
#>   <chr>                      <chr>      <chr>     <fct>  <int> <chr> <chr> <chr>
#> 1 transcript80929.chr17.nnic ENSMUSG00… U2af1     -       1038 no    no    no   
#> 2 transcript80960.chr17.nnic ENSMUSG00… U2af1     -        971 no    no    no   
#> 3 transcript80965.chr17.nnic ENSMUSG00… U2af1     -        971 no    no    no   
#> 4 transcript81001.chr17.nnic ENSMUSG00… U2af1     -        842 yes   no    no   
#> 5 transcript81044.chr17.nnic ENSMUSG00… U2af1     -        785 no    no    no

Multiple genes can be displayed by providing additional gene names:

features(fobj, "U2af1", "U2af2")
#> # A tibble: 9 × 8
#>   transcript_id              gene_id    gene_name strand width novel cds   nmd  
#>   <chr>                      <chr>      <chr>     <fct>  <int> <chr> <chr> <chr>
#> 1 transcript2976.chr7.nnic   ENSMUSG00… U2af2     +       1191 no    no    no   
#> 2 transcript2982.chr7.nnic   ENSMUSG00… U2af2     +       1800 yes   no    no   
#> 3 transcript3019.chr7.nnic   ENSMUSG00… U2af2     +        790 yes   no    no   
#> 4 transcript3021.chr7.nnic   ENSMUSG00… U2af2     +       1354 yes   no    no   
#> 5 transcript80929.chr17.nnic ENSMUSG00… U2af1     -       1038 no    no    no   
#> 6 transcript80960.chr17.nnic ENSMUSG00… U2af1     -        971 no    no    no   
#> 7 transcript80965.chr17.nnic ENSMUSG00… U2af1     -        971 no    no    no   
#> 8 transcript81001.chr17.nnic ENSMUSG00… U2af1     -        842 yes   no    no   
#> 9 transcript81044.chr17.nnic ENSMUSG00… U2af1     -        785 no    no    no

To quickly display the metadata of a different “set”, users may modify the set argument:

features(fobj, "U2af1", set = "AS")
#> ℹ Set `show_more to TRUE to show more info`
#> # A tibble: 3 × 8
#>   AS_id   gene_id               gene_name coord        AStype strand width novel
#>   <chr>   <chr>                 <chr>     <chr>        <fct>  <fct>  <int> <chr>
#> 1 AS10417 ENSMUSG00000061613.14 U2af1     chr17:31870… CE     -         67 no   
#> 2 AS10419 ENSMUSG00000061613.14 U2af1     chr17:31871… CE     -         67 no   
#> 3 AS10421 ENSMUSG00000061613.14 U2af1     chr17:31873… CE     -         88 no

or use a convenient wrapper function (see ?factR-meta) for more other wrapper:

ase(fobj, "U2af1")
#> ℹ Set `show_more to TRUE to show more info`
#> # A tibble: 3 × 8
#>   AS_id   gene_id               gene_name coord        AStype strand width novel
#>   <chr>   <chr>                 <chr>     <chr>        <fct>  <fct>  <int> <chr>
#> 1 AS10417 ENSMUSG00000061613.14 U2af1     chr17:31870… CE     -         67 no   
#> 2 AS10419 ENSMUSG00000061613.14 U2af1     chr17:31871… CE     -         67 no   
#> 3 AS10421 ENSMUSG00000061613.14 U2af1     chr17:31873… CE     -         88 no

TALK ABOUT AS_ids

The dataframe ouputs of features() and other convinient wrappers can be assigned to a new variable for further wrangling or directly piped to a downstream data manipulation function. In the example below, we show how to quickly count the number of events by the splicing type (AStype):

ase(fobj) %>% 
  group_by(AStype) %>% 
  tally()
#> ℹ Set `show_more to TRUE to show more info`
#> # A tibble: 6 × 2
#>   AStype     n
#>   <fct>  <int>
#> 1 CE      9298
#> 2 AD      8134
#> 3 AA      6391
#> 4 AF     11099
#> 5 AL      8705
#> 6 RI      4570

{factR2} now c

comes with improvements to the visualisation of transcript architectures.

plotTranscripts(fobj, "U2af1")
3186600031869000318720003187500031878000transcript80929.chr17.nnictranscript80960.chr17.nnictranscript80965.chr17.nnictranscript81001.chr17.nnictranscript81044.chr17.nnic
Genome postion (chr17)

Exon structures may sometimes appear thin and inconspicuous especially when the flanking introms are extremely long. To focus on the exon architectures, users may set the rescale_introns argument to TRUE:

plotTranscripts(fobj, "U2af1", rescale_introns = T)
050010001500transcript80929.chr17.nnictranscript80960.chr17.nnictranscript80965.chr17.nnictranscript81001.chr17.nnictranscript81044.chr17.nnic
Relative position (chr17)

Running factR pipeline

{factR2} contains 5 key functions to probe for biological function of custom transcripts:

  1. buildCDS(): to construct CDS information
  2. getAAsequence(): to translate amino acid sequences
  3. predictNMD(): to predict for NMD-sensitive transcripts
  4. testASNMDevents(): to pinpoint splicing events leading to NMD
  5. getAScons(): to quantify exon-level sequence conservation scores

The above functions can be simultaneously performed by running the runfactR() pipeline and this will update the metadata stored within the factRObject:

fobj <- runfactR(fobj)
#> 🡆  Building CDS information
#>     Searching for reference mRNAs in query
#>     16 reference mRNAs found and its CDS were assigned
#>     Building database of annotated ATG codons
#>     Selecting best ATG start codon for remaining transcripts and determining open-reading frame
#>     44009 new CDSs constructed
#> 
#>     Summary: Out of 115056 transcripts in `gtf`,
#>     44025 transcript CDSs were built
#>     ℹ Updating transcript feature data
#> 
#> 🡆  Running transcript-level NMD prediction
#> 
#>     Predicting NMD sensitivities for 44025 mRNAs
#>     ℹ Updating transcript feature data
#> 
#> 🡆  Translating amino acid sequences
#> 
#> 🡆  Running AS-NMD testing
#> 
#>     ℹ Getting best reference for AS-NMD testing
#> 
#>     ℹ Testing AS-NMD exons
#> 
#>     ℹ Updating AS feature data
#> 
#> 🡆  Quantifiying conservation scores
#> 
#>     ℹ Retrieving phastCons35way.UCSC.mm39 scores
#> 
#>     ℹ Quantifying `exon` conservation scores with 0 padding

The number of newly-identified coding transcripts is now reflected in the object summary:

fobj
#> class: factRObject [version 0.99.0]
#> # transcriptome:
#>    36931 genes; 
#>    115055 transcripts [99855 novel]; 
#>    44025 coding transcripts 
#> # active set: transcript
#> # samples (0):

Coding transcripts that are predicted to be NMD-sensitive are now annotated as “yes” under the column “nmd” in the transcript metadata:

features(fobj, "U2af1") 
#> # A tibble: 5 × 12
#>   transcript_id  gene_id gene_name strand width novel cds   nmd   stop_to_lastEJ
#>   <chr>          <chr>   <chr>     <fct>  <int> <chr> <chr> <chr>          <dbl>
#> 1 transcript809… ENSMUS… U2af1     -       1038 no    yes   yes              417
#> 2 transcript809… ENSMUS… U2af1     -        971 no    yes   no              -142
#> 3 transcript809… ENSMUS… U2af1     -        971 no    yes   no              -142
#> 4 transcript810… ENSMUS… U2af1     -        842 yes   yes   yes              430
#> 5 transcript810… ENSMUS… U2af1     -        785 no    yes   yes              342
#> # ℹ 3 more variables: num_of_downEJs <int>, `3'UTR_length` <dbl>, is_NMD <lgl>

The total number of NMD-sensitive transcripts can be queried as such:

features(fobj) %>% 
  group_by(nmd) %>% 
  tally()
#> ℹ Set `show_more to TRUE to show more info`
#> # A tibble: 2 × 2
#>   nmd        n
#>   <chr>  <int>
#> 1 no    111665
#> 2 yes     3390

Now, plotTranscripts() will distinguish between CDS and untranslated region (UTR) segments:

plotTranscripts(fobj, "U2af1", rescale_introns = T)
050010001500transcript80929.chr17.nnictranscript80960.chr17.nnictranscript80965.chr17.nnictranscript81001.chr17.nnictranscript81044.chr17.nnic
Relative position (chr17)

By default, the runfactR pipeline will compute the sequence conservation scores of the entire exon. To determine the conservation of intronic sequences flanking these alternative exons (50 base pairs on each side), users may rerun the getAScons() function and the newly-computed scores will be reflected as a separate column in the “AS” set metadata:

fobj <- getAScons(fobj, type = "flanks", padding = 50)
#> 🡆  Quantifiying conservation scores
#>     ℹ Retrieving phastCons35way.UCSC.mm39 scores
#>     ℹ Quantifying `flanks` conservation scores with 50 padding
ase(fobj)
#> ℹ Set `show_more to TRUE to show more info`
#> # A tibble: 48,197 × 12
#>    AS_id   gene_id           gene_name coord AStype strand width novel ASNMDtype
#>    <chr>   <chr>             <chr>     <chr> <fct>  <fct>  <int> <chr> <chr>    
#>  1 AS01129 ENSMUSG000000338… Tcea1     chr1… AD     +        231 yes   <NA>     
#>  2 AS01642 ENSMUSG000000259… Rb1cc1    chr1… AD     +       1104 yes   <NA>     
#>  3 AS01647 ENSMUSG000000259… Rb1cc1    chr1… AF     +        499 yes   <NA>     
#>  4 AS01649 ENSMUSG000000259… Rb1cc1    chr1… RI     +        141 yes   <NA>     
#>  5 AS01650 ENSMUSG000000259… Rb1cc1    chr1… RI     +        178 yes   <NA>     
#>  6 AS01651 ENSMUSG000000259… Rb1cc1    chr1… RI     +         80 yes   <NA>     
#>  7 AS01652 ENSMUSG000000259… Rb1cc1    chr1… RI     +         87 yes   <NA>     
#>  8 AS01657 ENSMUSG000000259… Rb1cc1    chr1… AD     +        749 yes   <NA>     
#>  9 AS01658 ENSMUSG000000259… Rb1cc1    chr1… RI     +         81 yes   <NA>     
#> 10 AS01659 ENSMUSG000000259… Rb1cc1    chr1… AD     +        173 yes   <NA>     
#> # ℹ 48,187 more rows
#> # ℹ 3 more variables: ASNMD.in.cds <lgl>, Cons.exon.pad0 <dbl>,
#> #   Cons.flanks.pad50 <dbl>

Testing regulatory function of AS-NMD events

Upon closer inspection of the alternative events from the U2af1 gene, we found 2 out of 3 exons that are predicted to trigger NMD in its host transcript. These AS-NMD events can be classified as “Stimulating” if the longer form (exon inclusion) of the transcript is NMD-sensitive or “Repressing” if the shorter form (exon skipping) of the transcript is NMD-sensitive.

ase(fobj, "U2af1")
#> # A tibble: 3 × 12
#>   AS_id gene_id gene_name coord AStype strand width novel ASNMDtype ASNMD.in.cds
#>   <chr> <chr>   <chr>     <chr> <fct>  <fct>  <int> <chr> <chr>     <lgl>       
#> 1 AS10… ENSMUS… U2af1     chr1… CE     -         67 no    <NA>      NA          
#> 2 AS10… ENSMUS… U2af1     chr1… CE     -         67 no    Stimulat… TRUE        
#> 3 AS10… ENSMUS… U2af1     chr1… CE     -         88 no    Repressi… TRUE        
#> # ℹ 2 more variables: Cons.exon.pad0 <dbl>, Cons.flanks.pad50 <dbl>

This can be visually verified by plotting the transcripts: ERROR PROPS UP HERE:

plotTranscripts(fobj, "AS10419")

To further test the regulatory function of these AS-NMD events, we can correlate the exon inclusion levels (PSI or Percent Spliced In) of AS-NMD events to its gene expression level. {factR2} has capabilities to compute gene expression counts and PSI values from transcript-level expression count matrix. To demonstrate this feature, we provide a sample transcript count matrix of all detected isoforms detected in postnatal mouse brain at two development timepoints. This count matrix can be added to the factRObject using the addTxCounts() function:

counts <- system.file("extdata/pb_expression.tsv.gz", package = "factR2")
fobj <- addTxCounts(fobj, counts)
#> 🡆  Adding expression data
#>     ℹ Importing from local file
#> 🡆  Creating samples metadata
#> 🡆  Processing expression data
#>     ℹ Adding gene counts
#>     ℹ Adding spliced-event counts
#>     ℹ Normalizing counts

Users may choose to use pre-computed PSI values from other splicing quantification tool by providing the path to the PSI matrix to the psi argument.

To test the correlation between exon PSI values and gene expression levels, users may use the testGeneCorr() function. By default, this function will perform a two-sided Pearson’s product moment correlation on variance-stabilised PSI and normalised gene expression expression. Users may customise the parameters of the correlation test by parsing addition arguments to the R cor.test() function. In the example below, we show how one would modify the testGeneCorr() function to carry out one-sided test:

fobj <- testGeneCorr(fobj,alternative="greater")
ase(fobj, "U2af1")
#> # A tibble: 3 × 14
#>   AS_id gene_id gene_name coord AStype strand width novel ASNMDtype ASNMD.in.cds
#>   <chr> <chr>   <chr>     <chr> <fct>  <fct>  <int> <chr> <chr>     <lgl>       
#> 1 AS10… ENSMUS… U2af1     chr1… CE     -         67 no    <NA>      NA          
#> 2 AS10… ENSMUS… U2af1     chr1… CE     -         67 no    Stimulat… TRUE        
#> 3 AS10… ENSMUS… U2af1     chr1… CE     -         88 no    Repressi… TRUE        
#> # ℹ 4 more variables: Cons.exon.pad0 <dbl>, Cons.flanks.pad50 <dbl>,
#> #   gene.cor.estimate <dbl>, gene.cor.pval <dbl>
.plot2way(fobj, "AS10419", "U2af1")
ase(fobj) %>% 
  filter(gene.cor.pval<0.05, 
         ASNMDtype=="Stimulating", 
         AStype=="CE") %>% arrange(desc(gene.cor.estimate))
#> ℹ Set `show_more to TRUE to show more info`
#> # A tibble: 42 × 14
#>    AS_id   gene_id           gene_name coord AStype strand width novel ASNMDtype
#>    <chr>   <chr>             <chr>     <chr> <fct>  <fct>  <int> <chr> <chr>    
#>  1 AS00959 ENSMUSG000000595… Uqcr10    chr1… CE     -         86 no    Stimulat…
#>  2 AS08345 ENSMUSG000000241… Rgs11     chr1… CE     +        922 yes   Stimulat…
#>  3 AS10419 ENSMUSG000000616… U2af1     chr1… CE     -         67 no    Stimulat…
#>  4 AS24284 ENSMUSG000000677… Xlr4b     chrX… CE     +         79 yes   Stimulat…
#>  5 AS23193 ENSMUSG000000187… Mpdu1     chr1… CE     -        121 no    Stimulat…
#>  6 AS41094 ENSMUSG000000355… Ankrd11   chr8… CE     -         97 no    Stimulat…
#>  7 AS31067 ENSMUSG000000293… Mthfd2l   chr5… CE     +         34 no    Stimulat…
#>  8 AS30603 ENSMUSG000000199… Uhrf1bp1l chr1… CE     +        134 no    Stimulat…
#>  9 AS15106 ENSMUSG000000511… Garin5a   chr7… CE     +        341 yes   Stimulat…
#> 10 AS18807 ENSMUSG000000214… Pcbd2     chr1… CE     +        140 yes   Stimulat…
#> # ℹ 32 more rows
#> # ℹ 5 more variables: ASNMD.in.cds <lgl>, Cons.exon.pad0 <dbl>,
#> #   Cons.flanks.pad50 <dbl>, gene.cor.estimate <dbl>, gene.cor.pval <dbl>

Predicting protein domains

Print AA sequence?

plotDomains(fobj, "U2af1")
#> ℹ Set `show_more to TRUE to show more info`
#> ℹ Set `show_more to TRUE to show more info`

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] BSgenome.Mmusculus.UCSC.mm39_1.4.3 BSgenome_1.66.3                   
#>  [3] Biostrings_2.66.0                  XVector_0.38.0                    
#>  [5] rtracklayer_1.58.0                 GenomicRanges_1.50.2              
#>  [7] GenomeInfoDb_1.34.9                IRanges_2.32.0                    
#>  [9] S4Vectors_0.36.2                   BiocGenerics_0.44.0               
#> [11] lubridate_1.9.2                    forcats_1.0.0                     
#> [13] stringr_1.5.0                      dplyr_1.1.2                       
#> [15] purrr_1.0.2                        readr_2.1.4                       
#> [17] tidyr_1.3.0                        tibble_3.2.1                      
#> [19] ggplot2_3.4.3                      tidyverse_2.0.0                   
#> [21] factR2_0.99.1                     
#> 
#> loaded via a namespace (and not attached):
#>   [1] colorspace_2.1-0              rjson_0.2.21                 
#>   [3] ellipsis_0.3.2                rstudioapi_0.15.0            
#>   [5] farver_2.1.1                  bit64_4.0.5                  
#>   [7] interactiveDisplayBase_1.36.0 AnnotationDbi_1.60.2         
#>   [9] fansi_1.0.4                   xml2_1.3.5                   
#>  [11] codetools_0.2-19              cachem_1.0.8                 
#>  [13] geneplotter_1.76.0            knitr_1.43                   
#>  [15] jsonlite_1.8.7                Rsamtools_2.14.0             
#>  [17] annotate_1.76.0               dbplyr_2.3.3                 
#>  [19] png_0.1-8                     shiny_1.7.5                  
#>  [21] HDF5Array_1.26.0              BiocManager_1.30.22          
#>  [23] compiler_4.2.1                httr_1.4.7                   
#>  [25] Matrix_1.6-1                  fastmap_1.1.1                
#>  [27] lazyeval_0.2.2                cli_3.6.1                    
#>  [29] later_1.3.1                   htmltools_0.5.6              
#>  [31] prettyunits_1.1.1             tools_4.2.1                  
#>  [33] gtable_0.3.4                  glue_1.6.2                   
#>  [35] GenomeInfoDbData_1.2.9        rappdirs_0.3.3               
#>  [37] Rcpp_1.0.11                   Biobase_2.58.0               
#>  [39] jquerylib_0.1.4               vctrs_0.6.3                  
#>  [41] rhdf5filters_1.10.1           crosstalk_1.2.0              
#>  [43] xfun_0.40                     mime_0.12                    
#>  [45] timechange_0.2.0              lifecycle_1.0.3              
#>  [47] restfulr_0.0.15               XML_3.99-0.14                
#>  [49] AnnotationHub_3.6.0           zlibbioc_1.44.0              
#>  [51] scales_1.2.1                  promises_1.2.1               
#>  [53] hms_1.1.3                     MatrixGenerics_1.10.0        
#>  [55] parallel_4.2.1                SummarizedExperiment_1.28.0  
#>  [57] rhdf5_2.42.1                  RColorBrewer_1.1-3           
#>  [59] yaml_2.3.7                    curl_5.0.2                   
#>  [61] memoise_2.0.1                 pbapply_1.7-2                
#>  [63] sass_0.4.7                    biomaRt_2.54.1               
#>  [65] stringi_1.7.12                RSQLite_2.3.1                
#>  [67] highr_0.10                    BiocVersion_3.16.0           
#>  [69] BiocIO_1.8.0                  GenomicFeatures_1.50.4       
#>  [71] filelock_1.0.2                BiocParallel_1.32.6          
#>  [73] rlang_1.1.1                   pkgconfig_2.0.3              
#>  [75] matrixStats_1.0.0             bitops_1.0-7                 
#>  [77] evaluate_0.21                 lattice_0.21-8               
#>  [79] Rhdf5lib_1.20.0               patchwork_1.1.3              
#>  [81] GenomicAlignments_1.34.1      htmlwidgets_1.6.2            
#>  [83] labeling_0.4.2                bit_4.0.5                    
#>  [85] tidyselect_1.2.0              DESeq2_1.38.3                
#>  [87] magrittr_2.0.3                factR_1.0.0                  
#>  [89] R6_2.5.1                      generics_0.1.3               
#>  [91] DelayedArray_0.24.0           DBI_1.1.3                    
#>  [93] pillar_1.9.0                  withr_2.5.0                  
#>  [95] KEGGREST_1.38.0               RCurl_1.98-1.12              
#>  [97] crayon_1.5.2                  utf8_1.2.3                   
#>  [99] BiocFileCache_2.6.1           plotly_4.10.2                
#> [101] tzdb_0.4.0                    rmarkdown_2.24               
#> [103] progress_1.2.2                locfit_1.5-9.8               
#> [105] grid_4.2.1                    data.table_1.14.8            
#> [107] blob_1.2.4                    GenomicScores_2.10.0         
#> [109] digest_0.6.33                 xtable_1.8-4                 
#> [111] httpuv_1.6.11                 munsell_0.5.0                
#> [113] viridisLite_0.4.2             bslib_0.5.1